81 research outputs found

    3D Car Shape Reconstruction from a Single Sketch Image

    Get PDF
    Efficient car shape design is a challenging problem in both the automotive industry and the computer animation/games industry. In this paper, we present a system to reconstruct the 3D car shape from a single 2D sketch image. To learn the correlation between 2D sketches and 3D cars, we propose a Variational Autoencoder deep neural network that takes a 2D sketch and generates a set of multiview depth & mask images, which are more effective representation comparing to 3D mesh, and can be combined to form the 3D car shape. To ensure the volume and diversity of the training data, we propose a feature-preserving car mesh augmentation pipeline for data augmentation. Since deep learning has limited capacity to reconstruct fine-detail features, we propose a lazy learning approach that constructs a small subspace based on a few relevant car samples in the database. Due to the small size of such a subspace, fine details can be represented effectively with a small number of parameters. With a low-cost optimization process, a high-quality car with detailed features is created. Experimental results show that the system performs consistently to create highly realistic cars of substantially different shape and topology, with a very low computational cost

    Single Sketch Image based 3D Car Shape Reconstruction with Deep Learning and Lazy Learning

    Get PDF
    Efficient car shape design is a challenging problem in both the automotive industry and the computer animation/games industry. In this paper, we present a system to reconstruct the 3D car shape from a single 2D sketchimage. To learn the correlation between 2D sketches and 3D cars, we propose a Variational Autoencoder deepneural network that takes a 2D sketch and generates a set of multi-view depth and mask images, which forma more effective representation comparing to 3D meshes, and can be effectively fused to generate a 3D carshape. Since global models like deep learning have limited capacity to reconstruct fine-detail features, wepropose a local lazy learning approach that constructs a small subspace based on a few relevant car samples inthe database. Due to the small size of such a subspace, fine details can be represented effectively with a smallnumber of parameters. With a low-cost optimization process, a high-quality car shape with detailed featuresis created. Experimental results show that the system performs consistently to create highly realistic cars ofsubstantially different shape and topology

    Event-based Camera Simulation using Monte Carlo Path Tracing with Adaptive Denoising

    Full text link
    This paper presents an algorithm to obtain an event-based video from noisy frames given by physics-based Monte Carlo path tracing over a synthetic 3D scene. Given the nature of dynamic vision sensor (DVS), rendering event-based video can be viewed as a process of detecting the changes from noisy brightness values. We extend a denoising method based on a weighted local regression (WLR) to detect the brightness changes rather than applying denoising to every pixel. Specifically, we derive a threshold to determine the likelihood of event occurrence and reduce the number of times to perform the regression. Our method is robust to noisy video frames obtained from a few path-traced samples. Despite its efficiency, our method performs comparably to or even better than an approach that exhaustively denoises every frame.Comment: 8 pages, 6 figures, 3 table

    Enhancing Perception and Immersion in Pre-Captured Environments through Learning-Based Eye Height Adaptation

    Full text link
    Pre-captured immersive environments using omnidirectional cameras provide a wide range of virtual reality applications. Previous research has shown that manipulating the eye height in egocentric virtual environments can significantly affect distance perception and immersion. However, the influence of eye height in pre-captured real environments has received less attention due to the difficulty of altering the perspective after finishing the capture process. To explore this influence, we first propose a pilot study that captures real environments with multiple eye heights and asks participants to judge the egocentric distances and immersion. If a significant influence is confirmed, an effective image-based approach to adapt pre-captured real-world environments to the user's eye height would be desirable. Motivated by the study, we propose a learning-based approach for synthesizing novel views for omnidirectional images with altered eye heights. This approach employs a multitask architecture that learns depth and semantic segmentation in two formats, and generates high-quality depth and semantic segmentation to facilitate the inpainting stage. With the improved omnidirectional-aware layered depth image, our approach synthesizes natural and realistic visuals for eye height adaptation. Quantitative and qualitative evaluation shows favorable results against state-of-the-art methods, and an extensive user study verifies improved perception and immersion for pre-captured real-world environments.Comment: 10 pages, 13 figures, 3 tables, submitted to ISMAR 202

    Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning

    Full text link
    This paper presents a novel metric learning approach to address the performance gap between normal and silent speech in visual speech recognition (VSR). The difference in lip movements between the two poses a challenge for existing VSR models, which exhibit degraded accuracy when applied to silent speech. To solve this issue and tackle the scarcity of training data for silent speech, we propose to leverage the shared literal content between normal and silent speech and present a metric learning approach based on visemes. Specifically, we aim to map the input of two speech types close to each other in a latent space if they have similar viseme representations. By minimizing the Kullback-Leibler divergence of the predicted viseme probability distributions between and within the two speech types, our model effectively learns and predicts viseme identities. Our evaluation demonstrates that our method improves the accuracy of silent VSR, even when limited training data is available.Comment: Accepted by INTERSPEECH 202

    Automatic Dance Generation System Considering Sign Language Information

    Get PDF
    In recent years, thanks to the development of 3DCG animation editing tools (e.g. MikuMikuDance), a lot of 3D character dance animation movies are created by amateur users. However it is very difficult to create choreography from scratch without any technical knowledge. Shiratori et al. [2006] produced the dance automatic generation system considering rhythm and intensity of dance motions. However each segment is selected randomly from database, so the generated dance motion has no linguistic or emotional meanings. Takano et al. [2010] produced a human motion generation system considering motion labels. However they use simple motion labels like “running” or “jump”, so they cannot generate motions that express emotions. In reality, professional dancers make choreography based on music features or lyrics in music, and express emotion or how they feel in music. In our work, we aim at generating more emotional dance motion easily. Therefore, we use linguistic information in lyrics, and generate dance motion. In this paper, we propose the system to generate the sign dance motion from continuous sign language motion based on lyrics of music. This system could help the deaf to listen to music as visualized music application
    • …
    corecore